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Two reduced rank classification procedures » principal components clas- 
sification and equal weights classification » are described and compared via 
a simulation study to the standard classification procedure to determine 
their feasibilities as alternative classification procedures. First, a 
justification for the development of these two reduced rank procedures is 
provided. Then, the two reduced rank rules are derived. The simulation 
design is described in detail. The simulation results demonstrate that the 
reduced rank procedures are preferable to the standard procedure under 
certain conditions, i.e., when they appropriately incorporate prior infor- 
mation about population structure into their classification rules. Sugges- 
tions for future research are offered. 
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Reduced Rank Classification 
Introduction 

A recurring problem common to many applied disciplines is that of 
assigning an individual to one of two or more distinct groups on the basis 
of the resemblance of that individual's scores on a set of measur.es to 
group profiles composed of the same set of measures. Classification is one 
name that is often assigned to this problem. Industrial psychology, par- 
ticularly personnel psychology, is fertile source of classification prob- 
lems. The assignment of an applicant to a., subgroup on the basis of his or 
her scores on a test battery containing measures of mechanical aptitude, 
clerical skills, psychomotor abilities, and vocational Interest is a prime 
example of a classification problem. Mastery testing is yet another basic 
classification problem. Sometimes, classification problems are disguised, - 
appearing under different labels. When the criterion is the dlchotomous 
variable of group membership in either the successful or unsuccessful 
group I the validation of a selection procedure can be recast as a classifi- 
cation problem. In fact, the classification framework is preferred 
whenever it is desirable to incorporate differential costs of misclasslfl- 
catlon into the decision process. 

Given a set of well-defined, mutually exclusive subgroups, the basic 
classification strategy is to assign an individual to the si '.group that he 
or she most resembles. Various mathematical definitions of "resemblance" 
and associated classification rules have been developed and examined. A 
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subset of these rules has been Introduced to the applied psychologist via 
books and articles written by Huberty (1975), Overall and Klett (1972), and 
Tatsuoka (1971, 197A, 1975). Mathematical introductions to the classifica- 
tion problem are contained in Anderson (1958) and Rao (1952, 1965). A 
comprehensive but mathematically demanding review of both historical and 
recent developments in classification analysis can be found in Das Gupta 
(1973) 



Standard Classification Rule for the Two Grouj) Case 

Classification into one of two multivariate normal subpopulations has 
been studied extensively in the mathematical statistics literature 
(Cacoullos, 1973). For two multivariate normal subpopulations with a com- 
mon covariance matrix, the sample "discriminant" function for the standard 
density function approach to classification (Anderson, 1958: Rao, 1952, 
1965) is 



where x^^ is a 1-by-p vector of observations on the p-dimensional random 
variable X for the ith individual, and ^2 sample centroida from 
subpopulations k^ and k^, respectively, and the 1-by-p vector bi contains 
sample linear discriminant weights, which are obtained via 



where C Is the sample pooled within groups covariance matrix. 

Classifications are based on the quantity in Equation 1, which is fre- 
quently referred to at the Wald-Anderson statistic. Let q- represent an 




[1] 



b » (x, - x'^)C 



[2] 
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estimate of the prior probability of membership In subpopulatlon k^^ or 
the "base rate". One plausible estimator of Is the proporr.lon of Indi- 
viduals In the sample at hand that are from subpopulatlon k^, l.e,, the 
relative sample size. Likewise, let represent an estimate of the prior 
probability of membership In subpopulatlon k^. The basic classification 
strategy goes as follows; Assign Individuals with response pattern x. to 



Note that the expression l^j^ssj represents the natural logarithm function 
evaluated at z. 

In practice, It would be desirable to have and to use the classifica- 
tion rule that yields optimal classification In the population*. Unfortu- 
nately, a researcher Is usually limited to working with samples from the 
population of Interest and must settle for an estimate of the optimal rule 
based on the sample data. Often these samples are small or moderate In 
size. Under these conditions, a "sample-optimal" classification rule, such 
as the rule described in the previous paragraph, might be developed that 
"overfits" the original data, and which could, consequently, produce 
severely suboptlmal classifications in future samples from the same popula- 
tion. Since a major goal in classification analysis is the correct assign- 
ment of individuals of unknown origin, a rule developed in a sample should 
be judged on the basis of its performance in the population rather than its 
performance in the sample in which it was developed. 

When the costs of misclassif icatiou are equal, the expected performance 
of a sample-optimal classification rule in future samples can be expressed 




and to subpopulatlon k^ otherwise. 
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as the probability of mlsclassif Icat ion assoclc. d wltb use of that rule in 
the population, i.e*, the actual error rate . In many ways, the actual 
error rate is the most important error rate associated with a sample clas- 
sification, rule. It is a direct measure of how well a sample classifica- 
tion rule can be expected to classify future observations from the popula-* 
tion of interest. 

Concern for the actual error rate associated with a sample-optimal 
classification rule is akin to interest in the cross-validity of a sample 
least squares regression equation. The poor cross-validities obtained for 
least squares regression equations developed in c aall to moderate sized 
samples has led to a surge of interest in reduced rank regression proce- 
dures (Herzberg, 1969; Einhorn and Hogarth, 1975; Dorans and Drasgow, 1978) 
and other biased regression procedures (Winer, 1978) as alternatives to 
ordinary least squares regression. Reduced rank procedures have been 
demonstrated to be superior to ordinary least squares regression under 
certain conditions. For example, Dorans and Drasgow (1978) found that both 
equal weights regression and principal components regression cross- 
validated better than least squares regression in populations characterized 
by knowledge of a structure among the predictors and knowledge about the 
directionality of predictor-criterion relationships. The success of 
reduced rank regression procedures suggests that reduced rank classifica- 
tion procedures might also be successful as alternatives to 
classification directly on the basis of the original predictors. 
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Purpose 

Previously Dorans (1979) » two reduced rank classification procedures 
were developed and their viability examined as alternatives to the stan<* 
dard full rank classification procedure described earlier. In the full 
rank procedure » the classification analysis Is performed In the complete 
p-dlmenslonal predictor space. In contrast ^ for both reduced rank classi- 
fication procedures 9 the classification analyses are performed In subspaces 
of reduced dimensionality. In principal components classlf Icatlon, the 
analysis Is performed In. the space of the r(< p) largest components of a 
standardized estimate of the total predictor covarlance matrix. The analy- 
sis Is performed along a single dimension In equal weights classification. 
Computationally, the two reduced rank procedures require the replacement of 
Equation 1 with Its reduced rank counterparts. 

The basic rationale for reduced rank procedures Is that In the many 
Instances where the "effective dimensionality" of a predictor battery Is 
smaller than Its "apparent dimensionality", the information lost by dis- 
carding dimensions is predominantly sample specific noise ttat, if used, 
would produce classification rules with large actual error rates. To bor- 
row a phrase from the literature on alternatives to ordinary least squares 
regression, the reduced rank rules should, under certain conditions, 
"cross-validate" better than the standard full rank rule, i.e., yield lower 
actual error rates. 

In the balance of this article, the two reduced rank classification 
procedures are presented, and the details of a simulation study, designed 
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to identify some conditions under which each type of classification proce- 
dure can be expected to yield acceptable actual error rates > are described. 
Then, the results of the simulation are presented and discussed, and 
suggestions are given for future research. 

Classification on the r Largest Components 
Tucker (1978) adapted the logic of principal components regression to 
multiple group discriminant analysis. In the process, he addressed a num- 
ber of interesting theoretical questions posed by this adaptation. In the 
remainder of this section, two points made by Tucker that have implications 
for the two subpopulation classification problem are mentioned. Then, the 
logic of Tucker's reduced rank approach is adapted to the two subpopulation 
classification problem. 

The first step in principal components regression is to perform a com- 
ponents analysis on the predictor intercorrelation matrix. In multiple 
group discriminant analysis, there are three distinct covariance matrices 
that are related via 

where Z is the total covariance matrix, E Is the within groups covariance 
matrix and is the between groups covariance matrix. The exis^.ence of 
three covariance matrices requires resolution of the following question: 
Which matrix shi)uld be the object of rank reduction? Tucker addressed this 
question analytically and empirically, concluding that rank reduction 
should be performed on a total covariance matrix. 
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Given that a total covariance matrix is the appropriate object for rank 
reduction, a new cuestion arises; How does one estimate the population 
total covariance matrix I from sample quantities? Tucker also addressed 

XX r -I 

this question, concluding that the appropriate estimate of -depends upon 
the nature of the sampling process that yields the final sample of observa- 
tions. He distinguished between stratified random sampling and complete 
random sampling. 

Under stratified random sampling, entities are randomly sampled from 
iach subpopulation under the constraint that the relative sample sizes are 
equal to their relative population sizes, i.e., 

n^/N » k » 1, 2, K • [4] 

For this type of sampling. Tucker (1978) derived the following estimate of 
L 

XX 

Estg[j:^J « [bG + (N - K)"^N - K + 1)(WG)]n"^ , [5] 
where BG is the sample between groups sums of products matrix, WG is the 
sample pooled within groups sums of products matrix, N is the total sanp?e 
size, and K is the total number of subpopulatlons. Note that Est (E ) 
does not equal the total sums of products matrix f divided by total sample 
size. Instead, T and Est ) are related via 

S XX 

Estg^r^J - |t + wg/(n-k)Jn^^ . [6] 
No constraints are placed on the relative sample sizes under complete 
random sampling. Entities are sampled randomly from the population without 
concern for the representativeness of the samplers final composition. 
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Under the assumption of complete random sampling, Tucker (1978) derived the 



following estimate of Z : 

XX 



Est 

r 



Z 

XX 



• [bG + Wg] [n - ij^^ , [7] 
the tamiliar unbiased estimate of the total covariance matrix* 



The mathematics of Tucker's reduced rank approach to the multiple group 
discriminant problem is simplified in the two subpopulation case. In the 
balance of this section, the logic of reduced rank discriminant analysis is 
adapted to the two subpopulation classification problem under Cv^nsideration 
in the present research. 

The first step in reduced rank classification is to convert the esti- 
mate of the population total covariance matrix into a correlation matrix. 

This standardization eliminates the effects of different units of measure * 

2 

ment for the p predictors. Let the p-by-p diagonal matrix S be defined as 

= Diag(Est j^rj) ; [8] 
then the standardization is accomplished via 



T = S"^Est[^E^^J)s"^ , [9] 
such that the diagonal elements of* T are equal to one. 
Next, a principal axis solution of T is obtained via 

T « VD^V , [10] 

2 

where D is a p-by-p matrix of eigenvalues written in descending order and 
V is the corresponding matrix of column eigenvectors. At this point, the 

rank reduction occurs as some decision rule is used to retain r roots. The 

2 

diagonal matrix contains the r retained roots and the p-by-r con- 
tai: the corresponding eigenvectors. So far, the procedure just outlined 
parallels the principal components regression procedure. 
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Next, the p-by-^r transformation matrix F 

F » S'^^^D^^^N - 2)'^ [11] 
Is applied to the scores on the p original predictors to obtain scores on 
the r largest components, 

Xri " ilj^ 

where x^^ is a 1-by -r generalized row vector of scores on the r components. 
These r component scores are uncorrelated in the total sample. 

Since the goal is classification on the basis of the r largest compo** 
nents, it Is necessary to develop an analogue of Equation 1 for the r larg- 
est components. Hence, quantities analogous to x,^, x^f x^t and C are 
required. In other words, the scores on the r largest components, the 
sample centroids on the r largest • "ponents, and the component within 
grov^^s covariance matrix^ are needed The component scores are defined in 
Equation 12. The remaining quantities can be readily obtained via 

- (N - 2)*^^F'^WgJf - F*CF [13] 

and 

X . - x! F k « 1, 2. [14] 
— rk --k 

where x^^^ is of order 1-by-r and is r-by-r. 

The quantities in Equations 12, 13, and lA can be combined to form a 

sample classification function for the r largest components, 

W [x J » fx , - .5(x , + X ^)lc ^^(x , - X [15] 
s[— r ij r i r 1 — r 2 J r — r 1 — r 2 
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The expression In Equation 15 can be expressed in terms of the p original 
predictors. Using the relationships in Equations 12 , 13 , and 14, Equation 
15 can be rewritten as 

Ws[ilri] ■ ^s[^i] " [ill - -SCZi +Z2^]^^^'^^^^^^'^1 "^2^'- f^^^ 
In sum, when classification is performed on the r largest components, 

the discriminant weights used in the classification rule are defined as 

[17] 

which allows Equation 16 to be rewritten as 

"s'i-i] " Ui ^ -^^il ''i2^]j^r' • f^^^ 
In contrast, the discriminant weights used when classification is performed 

directly on the original predictors are those depicted in Equation 2. As 

expected, when all p components are retained for classification purposes, 

the esultant rule is the standard full rank rule since, for r ■ p, 

F 

substituted into Equation 17 yields 

[s-'vo-'] 

(s"Wd"')~^(c"') (D"^V'S~^)' 



(N - 2)'^S'^V D « (N - 2)*^s"^VD"^ 
r r 



[19] 



b 

— r 



-1 -2' 



(N-2)(D"^V'S~^)C(s'^VD"h]"'^ 



■ ^ r 

(x,-X^) S"^VD"^J 'c"^;n~^~^ /p'^ /n'^Tri c"^~^ 



D'^V's"^] (N-2) 

[d-^'s-^] 



[20] 



b . 



In other words, the discriminant weights obtained via the principal compo- 
nents procedure are identical to the discriminant weights used by the stan- 
dard full rank procedure when all p (r = p) components are retained for 
the classification analysis. 
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Equal Weights Classification 

The success of equal weights regression (Dorans and DrasgoVi 1978) led 
the author to wonder whether an equal weights classification procedure 
would also be useful under certain circumstances. Since an equal weights 
classification procedure has not appeared In the literature 9 It was neces- 
sary to devise a classification analogue to equal weights regression. 

The first step In equal weights regression Is to obtain a composite of 
standardized predictors* Likewise » the first step In equal weights regres- 
sion should be the formation of a composite of standardized predictors. 
Tucker's (1978) developments In reduced rank discriminant analysis suggest 
that the predictors should be standardized with respect to the total sample 
metric. Thus a standardization such as that depicted In Equation 9 Is 
required. Next^ these standardized predictor scores are summed to obtain 
composite scores for each Individual. Let 1-by-p transformation vector 
be defined as 

t = 1 S"^ , [21] 
2 

where 1 Is a 1-by-p row vector of ones and S Is defined in Equation 8. 
Summing the standardized predictor scores Is accomplished via 

^tl " ^i^' ^22] 
where x^^ Is the score for the 1th Individual In the unit weighted compos-- 
ite of the p standardized predictors. 

Since the goal Is development of a classification rule based on this 
single equal weights composite > It Is necessary to develop an analogue of 
Equation 1 In terms of this composite. Thus, in addition to scores on the 
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composite 9 which are obtained via Equation 22 , the within groups variance 
and sample means on this composite are needed. The sample means can be 
obtain readily via 

Etk " Hkl' k - 1, 2 . [231 
and the within groups variance can be obtained via 

s^^ « t^Ct' . [24] 
The quantities in Equations 22, 23, and 24 can be combined to form a 
sample classification rule for the equal weights composite. 

By u.slng the relationships In Equations 22, 23, and 24, Equation 25 can be 
expressed in terms of the p original unstandardlzed predictors as 

^sKl] * ^s''[^l] " [^1 " -^^1 •'12^]- ^"^^-^1 ^^2^' • ^^^^ 
By letting 

« (x, - x^^-^-'^"'^"^-^ ^^^^ 
define the discriminant weights used In classification on the basis of the 

equal weights composite. Equation 26 can be rewritten as 

" [^i • '^^Hi ^2^]-t' • ^^^^ 
In sum, two types of reduced rank classification rules have been 

described in this section. In the principal components classification 
procedure, a standard classification analysis is performed on the r largest 
principal components of a standardized estimate of the population total 
covariance matrix. The sample classification rule for this reduced rank 
procedure involves Equations 17 and 18. In the equal weights classifica- 
tion procedure, a standard classification analysis is performed on a unit 
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weighted composite of the standardized total group predictor scores. The 
sam'^le classification rule for this latter reduced rank procedure Involves 
Equations 27 and 28. 

Both these reduced rank classification procedures are being studied as 
potential alternatives to standard classification analysis performed 
directly on the p original predictors. Equations 1 and 2 are used with 
this classification procedure. The rationale for studying these reduced 
rank alternatives Is that they may be less susceptible to derivation - 
sample Idiosyncrasies > and that^ consequently ^ they may produce lower 
actual error raises than the standard full rank procedure. 

Expected Performances of the Reduced Rank Procedures 

Under what conditions will reduced rank classification yield lower 
actual error rates than classification performed directly on the basis of 
the original predictors via the standard full rank procedure? This is an 
empirical question. Aprlorl» the expectation is that both reduced rank 
procedures should perform best in samples drawn from structured populations 
that are amenable to reduced rank description. In contrast ^ classification 
on the basis of the original predictors should perform best in samples 
drawn from populations characterized by random structure. 

Two classes of simulated populations were generated to test these 
apriori expectations or hypotheses. In one class of populations > the te^t 
vectors and subpopulation centroids were placed in random directicns within 
an orthogonal reference system of dimensionality larger than the number of 
predictors. Full rank classification, the standard procedure, was expected 
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to perform better than the reduced rank procedures in this class of popula- 
tions. The other class of populations was constructed within the framework 
of the common factor model, such that relationships among the observed 
predictors and group differences on the observed predictors were accounted 
for by a small number of common factors. Apriori, the reduced rank proce- 
dures were given the edge in these populations. 

Recall that there are two types of reduced rank classification proce- 
dures under Investigation: principal components classification and equal 
weights classification. Under what conditions should one of these proce- 
dures perform better than the other? This is also an empirical question. 
To address it, two subclasses of structured populations were constructed. 
In both subclasses, relationships among the observed predictors were 
described by a small number of common factors, and pubpopulation centroid 
differences on the p observed predictors were due solely to subpopulation 
differences on the common factor centroids* The orientation of these com- 
mon factor centroids was the feature distinguishing between the two sub- 
classes of structured populations. In the fully structured subclass, each 
of the common factors contributed equally to subpopulation discrimination* 
The apriori expectation was that this subclass of structured population 
favors the equal weights classification procedure* In the other subclass 
of structured populations, the orientations of the common factor centroids 
in the subpopulat ions were randomly directed* Principal components classi- 
fication was given the edge in this partly structured class of structured 
populations* 
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In 8um> two classes of populations were constructed In this research to 
address the usefulness of reduced rank classification procedures: random 
and structured. Within the class of structured populations were two sub- 
classes: fully structured and partly structured. Standard full rank clas- 
sification was given the aprlorl edge In the random populations, while the 
fully structured populations appeared most favorable to equal weights clas- 
sification. The principal components procedure was expected to perform 
better than the other procedures Jn the partly structured subclass of 
structured populations. 

Simulation Design 

Random Populations 

The class of random populations was constructed via 16 randomly direct- 
ed vectors of unit length and two randomly directed centrold vectors of 
variable /length. First, consider the general case: constructing two 
p-dlmenslonal normal subpopulatlons, k^ and k^i that have the same 
covariance matrix Z, and centrolds, and y^j respectively, for which the 
population generalized distance (Mahalonobls, 1936) Is fixed at a desired 
value. 

One begins by forming a p-by-p(p+r) matrix Z of random normal deviates, 
where p > r. Each row of Z corresponds to one of the p observed predic- 
tors. Each column of Z corresponds to one of p+r underlying orthogonal 
dimensions. (The quantity p+r Is not arbitrary: It Is also the number of 
underlying orthogonal dimensions In the structured populations.) Random 
normal deviates are chosen to ensure that each of the p rows of Z repre- 
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sents a random direction In a multivariate normal space of p+r dimensions. 
The entries In Z are the weights describing the. perfect regression of the p 
observed predictors onto the p+r underlying orthogonal dimensions. To set 
the variances of the p observed predictors equal to unity, each of the p 
rows of Z are rescaled to unit length. 

Let Z represent the p-by-(p+r) matrix of p randomly directed vectors of 
unit length in a multivariate normal space of p+r orthonormal dimensions. 
The common within groups covarlance matrix can be expressed as 

J - ZZ' . [29] 
The p-dimensional centroids for subpopulatlons and are defined via 

li - il^i^' [30] 

and 

where and jji^^ l-by-(p+r) centrold vectors that are randomly directed 
in the p+r orthonormal space. These two centrold vectors can be scaled 
such that the population generalized distance (Mahalanobis, 1936) 

\^ - (El - }L2)^'^(]Li - li2)' [32] 

2 

Is fixed at a desired value. For example, if 6 « 1 is the desired gener- 

? 

alized distance between subpopulatlons k, and k«, the vectors p , and u 

1 2 ^ I — z2 

are scaled such that the product in Equation 32 equals unity. 

In this simulation, where p = 16 observed attributes and r « 3 CDmmon 
factors, the same Z matrix was used for every random subpopulation. Hence, 
the same covariance matrix I characterized every random subpopulation. For 
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simplicity I the subpopulation centroid was set equal to the null vector 
throughout this simulation* This simplification does not limit the gener- 
ality of the simulation because the important quantity is the centroid 
difference (Vj^ - Vj^). 

k raw randomly directed l-bv-i9 centroid vector P ^ was generated. 
This vector was substituted irto Equation 31 to obtain a raw randomly 
directed l-by-16 centroid vector for subpopulation This raw 

centroid vector was rescaled four times via Equation ?2 to produce the 
populations with the desired population generalized distances of 1, 2, 4> 



In sum, all four pairs of random subpopulations were characterized by 
the same covariance matrix. The centroid vector for the first member of 
each pair of subpopulations was the null vector. The four centroid differ- 
ence vectors, are rescallngs of each other, differing with respect to the 
population generalized distance they produce when multiplied with the 
population covariance matrix S via Equation 32. 
Structured Populations 

Both subclasses of structured populations v^ere constructed on a factor 
analytic foundation (Thurstone, 1947). First, cotisider the general case; 
constructing two p-dimer.slonal normal subpopulations that have an equal 
covariance matrix and centroids P and Ji^, respectively, for which the 
population generalized distance is fixed at some desired value. For 

* 

simplicity, the r common factors are orthogonal. A hypothetical p-by-r 
factor weight matrix A can be devised. The eleme* ts of this matrix are 
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scaled such that the sums of squares for the jth row of A equals the comrau- 

2 

nality of the jth predictor, hj . The uniqueness of the j ;:h predictor, 
2 , , , 2 

Uj , equals 1 - hj . Hence, there Is a p-by-p diagonal matrix U with 
elements . 

The common factor model (Thurstone, 1947) postulates that the r ...mon 
factors and the p unique factors are uncorrelated . Hence, the covarlance 
matrlrc T. can be expressed as 



_A' 



AA' + U' 



[33] 



The controids, jj^j^ and vj.., , on the observed predictors for subpopulatlons 
and k^, respectively, are obtained via 

[34] 



and 



-1 [^U^lJ u 
^2 = K2lV]["0" 



- VJ -A' + VJ -U , 
— a2 -u2 



[35] 



where the 1-by-r vector y , and the 1-by-p vector u , are the centrolds on 

-ak — uk 

the common factors and unique factors, respectively, for the kth subpopula- 

tlon. In these structured subpopulations, it Ir asL imed that centrold 

differences on the observed predictors are due solely to differences on the 

r common factors. In other words, the difference vector (P , - u is 

— ul 'u2 

equal to the null vector. The scaling of the common factor centrolds jj^^ 
and Is such that the product 



p ^-l -2'" ^-1 



(IL 



[36] 



.2s-l 



is fixed at a desired value. 
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In this simulation, where p - 16 observed predictors and r « 3 common 
factors, the same A and U matrices were used for every structured sub- 
population In bothclasses of structured populations. Hence, the same 
covarlance matrix T, characterized every structured subpopulatlon* 

The centrold vector for the first member of all eight pairs of struc- 
tured subpopulatlons- was the null vector, A.^ mentioned earlier In the 
preceding section on random populations, this constraint does not Interfere 
with the generality of the simulation because the centrold difference vec- 
tot (iLj - lig) Is significant, not Its constituent elements. 

The feature that distinguishes between the two subclasses oi structured 
populations Is the manner by which the second population centrolds 
generated. Recall that In all structured populations, subpopulatlon dif- 
ferences on the observed predictors are due solely to subpopulatlon differ- 
ences on the r common factors. In the fully structured subclass of struc- 
tured populations, the three common factors contribute equally to sub- 
population differences. In other words, the raw common factor centrold for 
subpopulatlon k2 In the fully structured subclass Is a vector of ones. 

To obtain a raw l-by-16 centrold vector In the completely structured 
populations, Equation 35 was used. This raw completely structured centrold 
vector was rescaled via Equation 36 four times to obtain the desired popu- 
lation generalized distances of 1, 2, 4, and 8* 

In the I artly structured subclass of populations, the raw common factjr 
centroid for subpopulatlon k2 was placed in a random direction In the 
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three-dimensional common factor space* This random placement of the common 
factor centrold sharply contrasts with Its orderly placement In the 
completely structured subclass of structured populations* For the partly 
structured subclass, however, the l-by-3 vector ot ones J^^ was replaced by 
the randomly directed common factor centrold* Again, Equation 35 was used 
to generate a raw observed attribute centrold vector ii^. The resultant 
l-by-16 raw partly structured centrold vector was rescaled via Equation 36 
four times to produce the desired population generalized . dlstanci^s of 1, 2, 
4, and 8* 

In sum, 12 pairs of subpopulatlons were constructed In this simulation* 
All four pairs of random subpopulatlonr were characterized by the« same 
covarlance matrix* All eight pairs of structured subpopulatlons were char*- 
acterlzed by the same covarlance matrix chat 'Offered from the first* The 
centrold vector for the first member of all 12 pairs of subpopulatlons was 
a null vector. Within each class (subclass) of population structure, the 
four centrold vectors for the second member of the four pairs of sub- 
populations were rescallngs of each other, differing with respect to sub- 
population separation c.j measured by the population generalized distance* 
These four levels of population generalized distance were 1, 2, 
4 , and 8 • 

Sampling and Computation of Classification Rules 

Random samples of equal size were drawn from each of the 24 sub- 
populations at four levels of total sample slze» The four total sample 
size levels, N « 40, 80, 160, and 320, were chosen as representative of 
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four sample size to number of predictor (N to p) ratios often seen In prac- 
tice. These N to p ratios are 2.5, 5, 10, and 20. Since there are two 
subpopulatlons per total population, the equal subpopulatlon sample sizes 
were: n^ - n^ » 20, 40, 80, and 160, 

The sampling process Involved the generation of sample within groups 
sums of products matrices, WG^ and WG^i and sample centrolds, x^^ and 2^2* 
Since the predictors follow a multivariate normal distribution, the sam- 
pling distributions for the WGj^ are Wlshart, depending only on the popula- 
tion covariance matrix (Z), the sample size (^^) $ and the number of predic- 
tors (p) (Wlshart, 1928; Wijsman, 1959; Odell and Feiveson, 1966) • The 
sampling procedure used in this simulation is very similar to that employed 
by Herzberg (1969), There are, however, minor differences. Whereas 
Herzberg sampled a single covariance matrix per sampling unit, two sums of 
products matrices WG^ and WG^ were generated per sampling unit in this 
investigation. In addition, the widely known sampling distribution of the 
mean was used in this research to generate two sample centrolds, and 3C2, 
per sampling unit. 

Fifty replications at each of the four sample size levels were drawn 
from each of the 12 populations, yielding a total of 2400 pairs of 
simulated random samples. 

For each pair of random samples, both reduced rank classification pro- 
cedures and the standard full rank procedure were used to develop sample 
classification rules. The weights for the standard full rank classifica- 
tion rule were obtained via (2). The weights for the principal components 
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and equal welghtis classification rules were obtained via (17) and (27), 
respectively. Since the sampling process in this simulation was of a 
stratified random nature, the estimate used for the population total 
covariance matrix by the reduced rank classification procedures was that 
depicted in (5) . 

For each of the sample classification rules, the actual error rate 
associated with that rule was computed via (Lachenbruch, 1975) 

^c • •^^[^"vM^''^v^*1* -^pf^vkl^^v^'^] ^^^^ 

uliere P^zj is the cumulative normal density functio evaluated at z and 

Wg*|yj^J equals either ^y^^j , or W^" Juj^] , which can be obtained 

from Equations 1, 18, or 28, respectively. In Equation 38, the term V^* is 

the variance of the linear composite formed by using b* in the population, 

V„* » b*Eb*' , [39] 
W — — 

where b* equals either b, b^, or b^^, which are defined in Equations 2, 17, 
or 27, respectively. The actual error rate serves as the major dependent 
variable for assessing the performances of the two reduced rank procedures 
as potential alternatives to the standard full rank classification 
procedure. 

Results 

Summary information relevant to assessing the usefulness of the two 
reduced rank procedures is provided in Tables 1, 2, 3, and 4. First, the 
content of Table 1 is discussed. In this table, the optimal error rates 
for each type of classification rula in each of the 12 populations are 
presented. The optimal error rate is the probability of mlsclassif ication 
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associated with the use of the optimal population classification rule-' In 
the population. In other words, the optimal error rate is the lowest pos- 
sible error rate attainable In the population for a particular type of 
classification rule. 

Examination of Table 1 reveals that, In every population, the standard 
classification procedure, as expected, has the smallest optimal error rate, 
and that this error rate is independent of the structure of the population. 
For the standard classif I'^atlon procedure, the optimal error rates range 
from a high of .31 in populations where the generalized distance is 1 to a 
low of ,08 in populations where the generalized distance is 8. 

In contrast to the standard classification procedure, which makes no 
structural assumptions about the population, the two reduced rank classifi- 
cation procedures are sensitive to the structural characteristics of the 
population. The equal weights classification procedure is particularly 
sensitive to population structure. In the random populations, the optimal 
error rate for the equal weights classification rules range from .47 when 
the population generalized distance is 1 to .44 when the population gener- 
alized distance is 8. In the partly structured populations, the perfor- 
mance of equal weights classification improves, yet remains noticeably 
poorer than the other two procedures, particularly for large populatioft 
generalized distances. The optimal performances of the equal weights clas- 
sification rules in the fully structured populations provide sharp con- 
trasts to its optimal performances in the other two classes of populations. 
In this subclass of structured populations, the differences between the 
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Table 1 

Optimal Error Rates for the Three Classification Procedures 
in the Three Classes of Population Structure at the 
Four Levels of Generalized Distance 

Population Structure 



S 2 



6 " 
P 



6 2 



6 2 





Random 


Partly 


Fully 




Structure 


Structured 


Structured 


Rule 








ST 


.31 


.31 


.31 


PC 


.45 


.31 


.31 


EW •• 


.47 


.36 


.31 


ST 


.24 


.24 


.24 


PC 


.42 


.24 


.24 


EW 


.46 


.32 


.24 


ST 


.16 


.16 


.16 


PC 


.37 


.16 


.16 


EW 


.45 


.26 


.16 


ST 


.08 


.08 


.08 


PC 


.25 


.08 


.08 


EW 


.44 


.20 


.08 



Standard Classification Rule - (ST) 

Principal Components Rule With Three Components - (PC) 
Equal Weights Rule - (EW) 
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optimal performances of the three types of classification procedures are 
negligible • 

The principal components classification procedure performs well as the 
standard full rank rule In both subclasses of structured populations. In 
the random population, the principal components classification rules are 
clearly inferior to the standard classification rule. In contrast to the 
equal weights procedure, however, the principal components procedure 
exhibits noticeable improvement as the population generalized distance 
Increases in the random populations. 
Performance In Random Populations 

Table 2 summarizes the performances of the three types of classifica- 
tion procedures in the four random populations. It contains the mean 

actual error rates and associated standard deviations of the three types of 
classification procedures for ey.ei:y combination of population generalized 
distance and total sample size. Each entry in this table is based on fifty 
replications. 

As predicted, the standard full rank classification procedure yields 
the lowest mean actual error rates in all random populations at all four 
levels of sample size. Clearly, the two reduced rank classification proce- 
dures are inappropriate for this class of populations. To their credit, 
however, the mean actual error rates for both reduced rank procedures-; 
exhibit little sensitivity to changes in total sample size. In contrast, 
the standard classification procedure is sensitive to changes in total 
sample size. The relative Insensltivity of the reduced rank procedures to 
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Table 2 

I 

/ 

/summary Statistics for the Four Classification Trocedures at 
Various Generalized Distances (6^^) and Total Sample Sizes (N) 

in the Random Populations 



Classification Procedure 





Standard 


Princi 


pal 








Procedure 


Components 


Weights 






MEAN 


SD 


MEAN 


SD 


MEAN 


SD 








6 2 


« 1 












P 








.40 


.03 


.46 


.03 


• 






.37 


.02 


.46 


.02 


A Q 




160 


.34 


.01 


.46 


.02 


.48 


.02 


320 


.33 


.01 


.46 


.01 


.48 


.01 








6 2 


- 2 












P 








40 


.35 


.04 


• .43 


.03 


.48 


.03 


80 


.30 


.02 


.43 


.03 


.48 


.03 


160 


.27 


.01 


.43 


.03 


.48 


.03 


320 


.26 


.01 


.43 


.02 


.47 


.02 








6 2 


= 4 












P 








40 


.26 


.04 


.37 


.06 


.48 


.04 


80 


.20 


.01 


.37 


.04 


.4:' 


.03 


160 


.18 


.01 


"6 


.03 


.46 


.02 


320 ^- 


.17 


.00 


5 


.02 


.46 


.01 








6 2 


- 8 












P 






40 


.16 


.03 


.24 


.06 


.46 


.04 


80 


.12 


.02 


.24 


.05 


.45 


.03 


160 


.10 


.01 


.24 


.04 


.44 


.01 


320 


.09 


.00 


.24 


.03 


.44 


.01 



llie principal components rules retained three components. 
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sample size can hardly compensate, howeVert for the poor actual error rates 
that these procedures exhibit In the random populations. Clearly, neither 
reduced rank alternative can be preferred over the standard classification 
procedure In this class of random populations. 
Performance In Fully Structured Populations 

Table 3 summarizes the performances of the three types of classi- 
fication procedures In the four fully structured populations. It Is 
Identical In format to. Table 2. Recall that the covarlance matrix In the 
fully structured populations can be descrlbied by three common factors and 
16 unique factors. In addition, subpopulatlon differences are due solely 
to differences on the common factor centroids with each common factor 
contributing- equally to the common factor centrold differences. 

The performances of the standard classification procedure in these four 
fully structured populations are very similar to the performances it exhib- 
ited in the random populations. For example, its sensitivity to changes in 
total sample size remains evident. While the performances of the standard 
classification procedure in the random populations were clearly superior to 
those of the two reduced rank procedures, the same pattern of performance 
is clearly Inferior to the performance patterns of the two reduced rank 
procedures in these fully structured populations. 

Both reduced rank classification procedures perform well in the fully 
structured populations with the edge going to equal weights classification 
because of its remarkable performances. At each combination of sample size 
and population generalized distance, the mean actual error rate for the 



31 



Classification 
29 



Table 3 

Sunmiary Statistics for the Four Classification Procedures at 

2 

Various Generalized Distances (^p ) Total Sample Sizes (N) 
In the Fully Structured Populations 



Classification. Procedure 





Standard 


Principal 


Equal 






Procedure 


Components 


Weights 




N 


MEAN . 


SD 


MEAN 

6 2 
? 


SD 

- I 


' MEAN 


SD 


AO 


.40 


.04 


^ .33 


.02 


.31 


. 00 


80 


.38 


.02 


.32 


.01 


.31 


.00 


160 


.34 


.01 


.32 


.01 


.31 


.00 


320 


.33 


.01 


.31 

6 2 
P 


.00 
- 2 


.31 


.00 


40 


.34 


.04 


.26 


.02 


.24 


.00 


80 


.30 


.02 


.25 


.01 


.24 


.00 


160 


.27 


.01 


.24 


.00 ^ 


.24 


.00 


320 


.26 


.01 


.24 

6 2 
P 


.00 
» 4 


.24 


.00 


40 


.26 


.03 


.18 


.01 


.16 


.00 


80 


.21 


.02 


.17 


.01 


. 16 


.00 


160 


.18 


.01 


.16 


.00 


.16 


.00 


320 


.17 


.00 


.16 

6 2 
P 


.00 
- 8 


. 16 


.00 


40 


.16 


.03 


.09 


.01 


.08 


.00 


80 


.12 


.02 


.08 


.00 


.08 


.00 


160 


.09 


.01 


.08 


.00 


.08 


.00 


320 


.09 


.00 


.08 


.00 


.08 


.00 



The principal components rules retained three components. 
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equal weights classification rule Is equals to two decimal places » to the 
population optimal error rate, and the associated standard deviation Is, to 
two decimal places, zero. Even at N » 40, the sample equal weights classi- 
fication rules yield a mean actual error rate equal to the optimal 
error rate In the population, which Is the lowest error rate possible. 
Performance In the Partly Structured Populations 

Table 4 summarizes the performances of >dbhe three types of classifica- 
tion procedures In the partly structured populations. It Is Identical In 
format to Table^ 2 and 3. Recall that the only difference between the 
partly structured populations and the fully structured populations Is that 
the common factor centrold differences in the former are randomly directed 
In the three-dimensional common factor space. 

In the partly structured populations, the equal weights procedure Is 
not appropriate. Hence, It performs poorly. It Is Inferior to the princi- 
pal components procedure at all levels of sample size and population gener- 
alized distance, and It Is Infer-'.or to the standard classification proce- 
dure at most combinations of generalized distance and sample size. 

The patterns of performance for both the standard classification proce- 
dure and the principal components procedure are very similar to those 
patterns observed In the fully structured populations. The standard proce- 
dure retains its sensitivity to changes in sample size. The principal 
components procedure is unquestionably the preferred alternative to the 
standard classification procedure in these partly structured populations. 
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Table A 



Sunnnary Statistics for the Four Classification Procedures at 

2 

Various Generalized Distances ('^p ) Total Sample Sizes (N) 
In the Partly Structured Populations 



Classification Procedure 



Standard 
Procedure 



Principal 
Components 



Equal 
Weights 



N 



MEAN 



SD 



MEAN 



SD 



MEAN 



SD 









6 2 - 
P 


1 






AO 


.AO 


.03 


.33 


.02 


.38 


.04 


80 


.37 


.02 


.32 


.01 


.37 


.00 


160 


.34 


.01 


.32 


.01 


.37 


.00 


320 


.33 


.01 


.31 

P 


.00 

2 




. .00 


AO 


.34 


.04 


.26 


.02 


.33 


.05 


80 


.30 


.02 


.25 


.01 


.32 


.00 


160 


.27 


.01 


.25 


.01 


.32 


.00 


320 


.26 


.01 


.24 

62. 
P 


.00 

,4 


.32 


.00 


AO 


.26 


.04 


.18 


.01 


.27 


.01 


80 


.21 


.02 


.17 


.01 


.26 


.01 


160 


.18 


.01 


. 16 


.00 


.26 


.00 


320 


.17 


.00 


.16 

6 2 . 

P 


.00 

8 


.27 


.00 


AO 


.16 


.04 


.09 


.02 


.20 


.01 


80 


.11 


.01 


.08 


.00 


.20 


.01 


J60 


.10 


.01 


.OB 


.00 


.20 


.01 


320 


.09 


.00 


.08 


.00 


.20 


.00 



The principal components rules retained three components. 
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Discussion 

The adequacies of tne two reduced rank alternatives to standard full 
rank classification were dependent on iphe structure of the predictor 
battery and the nature of subpopulatlon differences on the predictors* 
Both reduced rank procedures were Inappropriate In the random populations; 
both were appropriate In the fully structured populations* In the partly 
structured populations i the equal weights classification procedure per- 
formed poorlyi while the performances of the principal components classifi- 
cation procedure was very good. The performance of the standard full rank 
procedure was invariant with respect to population structure i but exhibited 
a disturbing dependence on sample size. 

Populations with random structures are seldom seen in practice because 
predictor ba.|teries having the necessary features of a random population 
are difficult to construct. It is conceivable that a random structure 
might result from combining a jumble of measures for the purpose of trying 
to see how things "fall-out". Even such a hodgepodge predictor battery may 
exhibit an artificial structure imposed by unwanted factors such as method 
variance. The standard classification procedure should be superior to the 
two reduced rank procedures in data sets that approximate random popula- 
tions. 

In the applied behavioral sciences » it is fairly common to observe data 
sets that are structured and predictot batteries that are amenable to 
reduced rank approximation. In these settings* the reduced rank classifi- 
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cation procedures can utilize knowledge of the structural aspects of the 
data to generate classification rules that exhibit better stabilities 
(lower actual error rates) than the standard classification rules. The 
adequacies of the reduced rank procedures depend upon ihe appropriateness 
of the prior structural information that is incorporated into the process 
of generating, classification rules by tn^e procedures. 

In the fully structured populations, both reduced rank classification 
procedures exhibited better stabilities than the standard procedure. The 
performance of the equal weights procedure was particularly remarkable. Of 
the three classification procedures, the equal weights procedure is the 
least sensitive to derivation sample information. The rule for generating 
the equal weights composite is determined apriori and exerts a considerable 
amount of influence on the orientation of the equal weights composite in 
the total predictor space. This influence is based on the implicit assump- 
tion that subpopulation differences on all observed attributes are in the 
same direction. In che fully structured populations, this implicit assump- 
tion was true; hence, the very stable performance of the equal weights 
procedure. 

In the partly structured populations, the implicit assumption of a 
common direction for subpopulation mean differences was wrong. The equal 
weights procedure performed poorly in this class of populations. In 
contrast, the principal components procedure performed as well in the 
partly structured populations as it did in the fully structured popula- 
tions. The stable performances of the principal components classification 
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procedure In these two subclasses of structured populations hinged on its 
capacity to use Information about the effective dimensionality of the pre- 
dictor space and on the appropriateness of this information. In both sub- 
classes of structured populations, the decision to retain three components 
was appropriate; hence, the principal components rules exhibit more stabil- 
ity than the full rank rules. In the random populations, however, the 
decision to retain three components was Incorrect and, consequently, the 
principal components procedure performed poorly. 

In sum, the adequacy of the two reduced rank procedures depends upon 
the appropriateness of the prior information that they incorporate into the 
process of generating the classification rules. The equal weights proce- 
dure requires appropriate information about the directionality of sub- 
population differences. The principal components procedure requires appro- 
priate information about the effective dimensionality of the predictor 
space. In contrast, the standard full rank procedure does not require 
either type of information and Is considerably more sample dependent, yet 
less susceptible to poor performance because of inappropriate assumptions 
about the structural characteristics of the data. 

Future Research 

At this point, It Is Important to recognize that these simulation 
results should not be generalized In a thoughtless fashion. Each of the 
three classes of population structure was represented by a single replica- 
tion. The Intent of this research was not to generate prescriptions that 
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are applicable to any conceivable data set. Rather, this article intro- 
duced two reduced rank classification procedures and demonstrated that 
there are situations in which these two procedures are feasible alterna- 
tives to the standard classification procedure. Clearly, there is a need 
for future research that would expand the boundaries of this simulation and 
provide a more extensive specification of the conditions under which the 
two reduced rank procedures can be expected to perform better than the full 
rank procedure. 

Examination of thr sensitivity of the principal components procedure to 
incorrect decisions about the number of components to retain for classifi- 
cation purposes is one area for future research. Extension of these 
reduced rank procedures to multiple group cases also merits further inves- 
tigation. In addition, the usefulness of these procedures in real data 
sets should be examined. Clearly, there exist many avenues for future 
research on the two reduced rank classification procedures that were 
described and investigated in this article. 
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